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Abstract. It is generally hard to count, or even estimate, how many integer 
points lie in a polytope P. Barvinok and Hartigan have approached the problem 
by way of information theory, showing how to efficiently compute a random vector 
which samples the integer points of P with (computable) constant mass, but which 
may also land outside P. Thus, to count the integer points of P, it suffices to 
determine the frequency with which the random vector falls in P. 

We prove a collection of efficiently computable upper bounds on this frequency. 
We also show that if P is suitably presented by n linear inequalities and m linear 
equations (m fixed), then under mild conditions separating the expected value 
of the above random vector from the origin, the frequency with which it falls in 
P is 0(n-' m / 2 ) as n — > oo. As in the classical Littlewood-Offord problem, all 
results in the paper are obtained by bounding the point concentration of a sum 
of independent random variables; we sketch connections to previous work on the 
subject. 



1. Introduction 

The problem of counting integer points in polytopes has been extensively studied, 
and appears to be quite difficult in general. It is NP-hard to determine whether an 
arbitrary integral polytope with n facets contains an integer point at all [10] . Given 
this state of affairs, attention has largely shifted to approximating or bounding the 
number of integer points in a polytope, and the closely related problem of sampling 
almost uniformly from the set of integer points in a polytope. 

For certain classes of polytopes, almost uniform sampling has been achieved by 
specially constructed Markov chains with good mixing properties. One notable 
success of this method is due to Jerrum, Sinclair, and Vigoda, who in JT3] construct 
a fully polynomial randomized approximation scheme for the permanent of a 0- 
1 matrix (equal to the number of integer points in a perfect matching polytope). 
However, for a general polytope P, it is not known how to efficiently generate Markov 
chains which sample almost uniformly from the integer points in P. A survey of 
this and other approaches to the problem can be found in [5], [6]. 
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In pQ, Barvinok and Hartigan proposed a new approach to the problem using the 
principle of maximum entropy. Given a polytope PcK" defined by the inequali- 
ties 

xi > 0, %2 > 0, . . . , x n > 0, Ax = b, 
where A G ]R mxn anc l b G R m , they introduce a random vector X = (Xi, X 2 , . . . , X" n ) 
of maximum entropy, subject to the constraints that all coordinates are distributed 
on Z> and that E[AX] = b (i.e., the mean of X lies in P). This random vector 
has constant mass e~ H ^ on all points of P n Z™, where H(X) is the entropy of the 
random vector, defined by 

H(X) := - PAX = (k 1 ,...,k n )] lnPr[X = (k 1 ,...,k n )]. 

fci,...,fc n >0 

Thus X is, in a sense, a good approximation of the uniform distribution onPfl Z n . 
However, not all of the mass of X lies in P; thus we have 

\Pf]Z n \ = e H{x) Pr[X e P]. 

As it turns out [lj, the coordinates of X are independent and geometrically dis- 
tributed, that is, there exist qj G [0, 1), 1 < j < n so that 

Pr[Xj = k] = (1 - for fc G Z> . 

After a change of parameter Zj := E[Xj] = yz^~! the entropy H(X) may be written 

as 

n 

(1) F(A) = J2( z i + l ) H z j + 1) - «i ln 

i=i 

This is a strictly concave function of zi, . . . , z n , so it can be maximized efficiently 
by (e.g.) interior point methods (for details, see [T]). Thus the parameters qj, and 
with them the distribution and entropy of X, are efficiently computable. Hence, 
the outstanding question is how to bound the factor Pr[X G P], particularly under 
weak assumptions (i.e., when a local central limit theorem is not feasible). This 
paper offers several upper bounds. 

2. Summary of results 

2.1. Definitions and notation. Throughout this paper, A always denotes anmxn 
matrix with real entries; we assume that n > m and that rank(A) = m. We denote 
the columns of A by ai, a.2, . . . , a„. The random vector X = {X\, X 2 , . . . , X n ) is 
defined as in the introduction, so as to maximize the entropy H(X) subject to the 
constraint E[v4X] = b = (&i, 6 2 , • • • , b m ) G M. m . We define the parameters qj, Zj as 
in the introduction. 

We define the point concentration of a discrete random variable Y by 

conc(F) := maxPrjy = y]. 
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An upper bound on conc(AX) is, necessarily, also an upper bound on 
Pr[AX = h} = Pr[X e P\. Therefore, we have 

(2) |PnZ n | < e H{x) conc(AX). 

2.2. Results. Under the hypotheses above, we prove: 
Theorem 1. 

|pnz"| < e H W min ( i_ gji)( i_^)... ( i_ gjm) 

a Ji ,.--,a Jm lin.mdep. 



= e H{x) min 



a jiv,a im lin.indep. ._ ^. 



Corollary 1. Let Ii, I 2 . . . , I p be m-element subsets of {1,2, ... , n}, 

h = {jkl, jk2, ■ ■ ■ ,jkm}, 

such that a.j kl , . . . ,&j km form a basis for R m (1 < k < p), and such that 
h U I 2 U • • • U I p = {1, 2, . . . , n}. Then 



■[X] + 

where X is a geometrically distributed random variable with entropy equal to ^H(X). 

(A formula for the entropy of a geometrically distributed random variable is given 
in section 1, (1).) 

Theorem 2. Suppose that n = pm for some integer p, that A has integer entries, 
and that a( fc _!) m+1 , 2L(k-i)m+2, ■ ■ ■ , a km are linearly independent for 1 < k < p. As- 
sume that b) > for 1 < j ' < n. Define 

:= mm{q (k _ 1)m+i : 1 < k < p} (\<i<m). 

Then there exist constants C = C(q\, . . . , q^) and C = C\q\, . . . , q^), with C < 1, 
such that 

\pnz n \ < e H(x) (c P - m/2 + {C'Y). 

(In fact, there is a one-parameter family of pairs of constants (C, C) for which this 
statement holds. Explicit formulas and bounds for C and C are provided in section 
5.) 

Theorem 3. Suppose that n = pm for some integer p and that, for each i = 
1,2,..., m, we have = a m+i = a 2m+i = • • • = a (p _ 1)m+i; where {ai, a 2 , . . . , a™} 
is a basis for M m . (That is to say, the columns of A cycle through a basis of M m 
periodically.) Then 

m -1/2 

|PnZ»| < e^lKJilEiXj + lf-l)) . 

i=i 
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(Here < means that, given fixed a 2 , . . . , a^ the expression on the left side is 
bounded above by a function which is asymptotic to the expression on the right side 
as p —¥ oo.) 

2.3. Plan of paper. In section 3, we discuss these results in the context of prior 
work, and give examples of their use. In section 4, we prove Theorem 1 and Corollary 

1. The most substantial portion of the paper is section 5, in which we prove Theorem 

2, then bound the constants appearing in it. In section 6, we prove Theorem 3. 

3. Discussion and examples 

The concentration of sums of random variables is a richly studied subject. The 
particular program of obtaining upper bounds, sometimes called "anti-concentration 
results," may be considered to have originated with the Littlewood-Offord problem. 
This problem asked for the maximum concentration of 

eid + e 2 a 2 + h e n a n 

when oi, a 2 , . . . , a n are integers and ei,e 2 , . . . ,e n are symmetric Bernoulli random 
variables. The exact solution, which is of order 0(n -1 / 2 ), was provided by Erdos [9J. 

Halasz [11] extended this result to random sums of m- vectors (again with symmetric 
Bernoulli coefficients), obtaining a bound of order 0(n~ m / 2 ) under conditions en- 
suring that the vectors are reasonably "spread out" in IR m (i.e., not excessively close 
to a proper subspace). Halasz's results pertain to the small ball concentration of 
Eiai + e 2 a 2 + ■ ■ -+e n a n , but can be specialized to point concentration. These results, 
which Halasz proved using a Fourier-theoretic lemma of Esseen, were subsequently 
reproduced by Oskolkov [121 notes by Howard] using rearrangement inequalities. 
Theorem 2, herein, arrives at a similar conclusion when the Bernoulli coefficients 
are replaced by geometric ones. In particular, Theorem 2 implies the following 
Gaussian-like asymptotics: 

Corollary 2. Suppose that a subset of the columns of A can be partitioned into p 
bases for M. m . Then for mm,- qj bounded away from 0, the point concentration of AX 
is 0(p~ m / 2 ) as p —7- oo. 

Our proof of Theorem 2 hews closely to the method of [12] . For other approaches 
to anti-concentration inequalities, see [15], [16] . 

Theorem 2 is essentially an asymptotic result; although we give explicit formulas 
for C and C, the bounds obtained from Theorem 2 are typically only strong when 
p is large, i.e., when n 3> m. (For further remarks on this theme, see the end of 
section 5.1.) By contrast, Theorem 1 and its corollary are non-asymptotic, and are 
apparently most effective when n ^> m. They are also relatively straightforward, 
but do not capture the 0(p~ m ^ 2 ) behavior of conc(AX). Thus, Theorem 1 and 
Theorem 2 may be seen as filling somewhat different niches. Theorem 3 gives a 
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more ideal bound, combining all the attractive features of Theorems 1 and 2, but is 
pertinent only to a very special case (the easiest, where a local central limit theorem 
is available). Relying for its proof on notions from the theory of partially ordered 
sets, Theorem 3 may serve as a suggestion of how combinatorics can be brought to 
bear on this problem. 

3.1. Examples. Given nonnegative vectors R G W, C G K s , the transportation 
polytope U(R, C) is defined as the set of all nonnegative r x s matrices whose row 
sums and column sums are the coordinates of R and C, respectively. Such a matrix 
with integer entries is called a contingency table. 

We may use Theorem 1 to bound the number of 4 x 4 contingency tables with 
given "margins" R and C. For example, let R = (108,286,71,127) and C = 
(220, 215, 93, 64), as in a table appearing in [7] which has become a standard bench- 
mark in the literature on contingency tables. The actual number of tables with these 
row and column sums is 1.23 x 10 15 . (It can be computed exactly, as the dimension 
is fairly low: the defining matrix A for U(R, C) is 7 x 16.) 

Let X be a random matrix taking the maximum-entropy distribution on Z>q 4 , under 
the constraint that E[X] G U(R,C). Solving the convex optimization problem 
described in section 1, we compute 



E[X] = 



( 36.4 


36.0 


20.6 


14.9\ 


117.2 


113.4 


34.3 


21.2 


22.2 


22.0 


15.1 


11.7 


^44.2 


43.6 


23.0 


16.2/ 



and H(X) = 2.96 x 10 30 . Theorem 1 then gives 

.r, „„. 2.96 xlO 30 

p n z < 

1 (1 + 36.4)(1 + 117.2)(1 + 113.4)(1 + 34.3)(1 + 21.2)(1 + 22.2)(1 + 44.2) 

= 7.14 x 10 18 , 

off by a factor of about 5800. Computation of similar examples suggests that the rel- 
ative error depends mainly on the dimensions of R and C, and not on the magnitude 
of their entries. 

Theorem 2 performs relatively poorly in these examples, but is much more effec- 
tive than Theorem 1 when n is large compared to m. For instance, consider the 
simplex 

£ n (r) := {(xi, . . . ,x n ) : x x , . . . , x n > 0, ||x||i = r}, 

which has integer points. Let < 5 < |. Then, choosing 7 = 77= in the 

statement of Theorem 2a (see section 5), one obtains as a conclusion an upper bound 
on |S n (r) fl Z n | which is precisely asymptotic to ( n+ ^ x ) as n — > 00, if r grows as 
Q(n £ ) for some e G (0, 1). 
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For r = 10 and n = 1000, the optimal result of Theorem 2a (achieved when 7 = 
0.172) is an upper bound of 3.14 x 10 23 , which may be compared with an exact 
count of 2.88 x 10 23 integer points. By comparison, when r = 100 and n = 10000, 
the optimal result of Theorem 2a (achieved when 7 = 0.0645) is an upper bound of 
1.774 x 10 242 integer points; the exact count is 1.755 x 10 242 , and the relative error 
is about 1.1%. 



4. Proofs of Theorem 1 and Corollary 1 



We prove Theorem 1 by means of the following simple fact: 

Lemma 1. IfX,Y are independent, discrete random variables, then conc(X + F) < 
conc(X). 

Proof. Observe that conc(X + F) is a weighted average of values of the probability 
mass function of X, of which the largest is conc(X). □ 

Proof of Theorem 1. Using Lemma 1 and the previously mentioned properties of 
geometric random variables, 

conc(Xiai H h X„a n ) < min concpr^a,^ H \- X jm SL jm ) 

a 31 ,...,a 3m lin.indep. 

< min Pr[X h = --- = X jm = 0] 

a 31 ,...,a 3m lin.mdcp. 

min (1 - qjl )(l - q j2 ) ■ ■ ■ (1 - q jm ) 

a ii ,---,a im lm.indep. 



min TT . 

lin.indep. f± Zj i + 1 



By section 2.1, (2), it follows that 



\PHZ n \ < e 11 ^ min (i_ ? .J(i_ (Z . 2) ...(i_ (? .J 

a 31 ,...,a Jm lm.indep. 

m 



= e H{x) niiii 



II : 



a Ji v,a 3m lin.indep. fX z ._ _|_ 

To prove Corollary 1, we will require this fact whose proof is deferred until after the 
proof of Corollary 1: 

Lemma 2. Among all vectors Y := (Y 1 ,Y 2 , . . . ,Y m ) of independent, geometrically 
distributed random variables with fixed joint entropy Q, the highest concentration 
conc(F) is achieved when Yi, Y 2 , . . . , Y m are identically distributed. 

Proof of Corollary 1. For / C {1,2, .. . , n}, let H(Xj) denote the joint entropy 
of {Xj : j G /}. Since X ± , . . . , X n are pairwise independent, we have H(Xj) = 
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Since the sets ii, J 2 , . . . , I p cover {1,2,..., n}, we have 



p 



H{X)<Y J H{X Ik ), 



k=l 



and thus by the pigeonhole principle 



H{X Ik ) > l -H{X) 



for some k G {1, . . . , p}. By Lemma 2, the concentration of the vector (Xj kl , . . . , Xj km ) 
is maximized when Xj kl , . . . , Xj km are identically distributed. In this case, each has 
entropy equal to ^H(X Ik ), which is greater than or equal to H(X) = -^H(X); we 
pause to note that the entropy and the expectation of a geometric random variable 
are monotonically increasing functions of one another. Thus (as in the proof of 
Theorem 1), 



so Corollary 1 follows by section 2.1, (2). ■ 

Proof of Lemma 2. Since Yj is geometrically distributed (1 < i < m), there exist 
parameters r\ G [0, 1) such that 



The concentration of Y is n™i(l — r «)> so we mus t show that this expression is 
maximized (for fixed Q) when r± — . . . — r m . 



cone (AX) < conc(X jfcl a jfel + • • • + X Jkm a jkm ) 




Pr[y. = k] = (1 - n)rf for k G Z- 



We introduce the changes of variable Sj : = 
Si = e*% where U G [0, oo).) Also, let 



i 



ti := In Si. (Thus 1 — r\ = j-, and 



u(t) := (1 - e*) ln(l - e _t ) + *. 
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Now 



m r- 1 1 

n = J2T^ ln - + ln 



n n l - ri 



i=l 
m 

= - 1) In — 37 + In s, : 

. . Si I 



i=i 

m i 

e * 



= ^^-1)111-—- + ^ 
^— ' e ti — 1 

i=l 

m 

= ^(l-e' i )ln(l-e _ti )+ti 

i=l 
m 



i=l 



and 



rrt /■ m \ 

i=l ^ i=l ' 



The following three statements are equivalent: 

(1) For fl fixed, ^(1 ~ r i) is maximized when n — • • • — r m . 

i 

(2) For fl fixed, J^ij is minimized when ti — . . . — t m . 

i 

(3) If ti is fixed and fl free to vary, then fl is maximized when t± — . . . — t m . 

i 

The equivalence of statements (1) and (2) is clear. To see that (2) and (3) are 
equivalent, it is enough to observe that fl is increasing with respect to each of 
ti, . . . ,t m . Thus to prove (1), which is the assertion of the lemma, it will suffice for 
us to prove (3). 

Writing s : = e*, we obtain 
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and 



d oj t e 
- -e 



-t 



dt 2 



-—_- e «ln(l- e -*) 
1 — e 1 



1 , f 1 
r - s In 1 - - 

1 - i I s 

s v 
S S 

+ sin 



s- 1 



S-l 



-*l — —I + sin (1 + 



s- 1 



s- 1 



<0, 



since ln(l + x) < x for x > 0. This shows that is concave for t > 0, which 
implies (3) and so completes the proof of the lemma. ■ 

5. Proof of Theorem 2 



We begin by restating the theorem with explicit formulas for all constants: 

Theorem 2a. Assume the definitions and notation from section 2.1. 

Suppose that n = pm for some integer p, that A has integer entries, and that 
a (k-i)m+i, a (fc-i)m+2, • • • , a fcm are linearly independent for 1 < k < p. Assume that 
(a.j, b) > for 1 < j < n. Let 7 > 0. Define constants 

2 Qj 



(l-<7,) 2 

0% := min{a (fe _ 1)m+i : 1 < k < p} 
:= mm{q {k _ 1)m+i : 1 < k < p} 



(l<J<n), 

(1 < % < m), 
(1 < % < m), 



max < — In l + a. v (l — cos — 7=) , . . _ In [l + 2a. v 1 (1 < i < m), 



C := W^atr 1 ' 2 , 



c 



max e 

Ki<m 



Then 



\pnz n \ < e H{x) (Cp- m/2 + (C') P ). 



All notation introduced in Theorem 2a is used throughout this section, and all its 
hypotheses (importantly, the integrality of A) are assumed to hold. In subsection 
5.1, we introduce a series of definitions and lemmas, then prove Theorem 2a under 
assumption of the lemmas. In subsection 5.2, we prove the lemmas in turn. For 
bounds on the constants C and C", see subsection 5.3. 
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5.1. Supporting results and proof of Theorem 2a. 
Definition 1. For 1 < k < p, define the function Uk '■ R m — > R by 

km 

n fc (t):= n 7f= n zT ^r forte (-^^ 

n fc (t):=0 for t £ (-7T, vr] m . 

Lemma 3. Given the definition above, 

conc(AX) < -i— / nxn 2 • • • U p dt. 

Definition 2. Given a measurable function $ : R m — > M>o, we define its epigraphs 

T> T ($) := {t G M. m : $(t) > r} 

/or all r > 0. 

Suppose $ vanishes at infinity, meaning that T> T ($) /ias finite volume for each 
t > 0. T/ien we define its symmetrically decreasing rearrangement as the function 
$* : R m -)■ M>o §roen fry 

$*(t) :=max|r : vol(r> T ($)) > ||t|| w <u m }, 

where v m denotes the volume of the unit ball in IR m . 

The theory of symmetrically decreasing rearrangements is treated in [3j, and we do 
not develop it fully here. The important properties of $* are that 

• $* is symmetrically decreasing, i.e., ||t|| > ||s|| =>■ $*(t) < $*(s); and 

• $* is equimeasurable with <3>, i.e., vol(r> r ($*)) = vol(r> T ($)) for all r > 0. 

Note that $* is the unique function with these properties, up to a difference on a 
set of measure zero. 

Lemma 4. Given the definition above, 

/ fixiis • • • u p dt < / n*n; • • • n* dt. 

</(-7T,7r] m jR m 

Definition 3. For 1 < k < p, define the function n^ ect : M m — > R fry 

nr (t) := II 71 T n =TT fOT * G t- 7 ^" 1 ' 

7 = 1 V 1 + «(fc-l)m+i(l - COSti) 

n r fc ect (t):=0 for t £ (-7r,7r] m . 
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The formula for n^, ect differs from that for Il fc in that the linear form (t, a( fc _i) m+i ) 
in the denominator of IT fc is replaced by tj. Effectively, each basis 

a (fc-l)m+li a (fc-l)m+2) • • • j a fcm 

of W" 1 is replaced by a standard basis. This will make n^, ect easier to work with than 

n fc . 

Lemma 5. Let 1 < k < p. Then 

voi(r> T (nr)) =voi(r> T (n fc )) 

for all r>0, and (IT fc cct )* = n*. 

Lemma 6 (Isotonicity of rearrangement). Suppose \1/ : M m — > IR>o are measurable 
functions vanishing at infinity. Let r denote a constant. Then: 

(1) J/$(t) > tf(t) for allt, then $*(t) > tf*(t) for allt. 

(2) 7/$(t) > max{*(t), r} for allt, then $*(t) > max{**(t), r} for all t. 
Lemma 7. Define a( and Ci as in the statement of Theorem 2a. 

Then, for < t < min j^y> ^j; we have 1 + a^(l — cost) > e ClQ ^* 2 . 
Lemma 8. For each k — 1, 2, . . . ,p, and for all t e M"\ we /iai>e 

n r fe ect (t) < max | f]e- c ^ Vt ? /2 , C" j. 

Given the above lemmas, we can prove Theorem 2a: 
Proof of Theorem 2a. Using Lemmas 3, 4, and 5, we have 

conc(AX) < [ U,U 2 ■■■n p dt 

l 27I "J J(-7T,Tr}™ 

< [ n*n; • • • n; dt 

(27r) m J Rm 



1 f (n rect r(n rect r . . . (nrect)* dt 



(2tt 

We may instead take either of the last two integrals over B, the closed ball of 
volume (27r) m centered at the origin in M. m , since the integrands are zero outside 
this ball. 
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(27 



^ ect )* • • • (n^ ect )* dt 



By Lemmas 6 and 8, we have 
^ Is (nrrW 

^/ B n(-{(n-^ /2 )'. <?})* 

m v \ P 

(f]e-^ Vt ?/ 2 )*, C'jj dt 



(2tt)' 
1 



(27T) 



max 



(2tt)' 



max 



(— !T,7r] 



|pj e -^rt?/ 2? c'^j dt. 



This last integral is bounded above by 
1 



(27T) 
1 



(27T)' 



[ exp ( - p jr -Ciattl/2] dt + {2it) m {C') p dt 



1 m 
_i_. (27r)m / Vm / 2 JJ M V r l/2 + 



i=l 



= c P - m/2 + (C") p - 



Now, a technical remark. In integrating the Gaussian term, we assumed qq;/ > 0. 
To see why this is necessarily true, note that we assumed, in the statement of The- 
orem 2a, that (a,, b) > for 1 < j < n. Thus P is not contained in any coordinate 
hyperplane of R™. Recall section 1, (1), which gives the entropy H(X) in terms 
of z\,Z2,...,z n (the coordinates of E[X]). One may check that -^H(X) = oo 
when Zj = 0, but is finite when Zj > 0. Therefore the maximum-entropy distri- 
bution for X does not take expected value on a coordinate hyperplane; therefore, 
Cia\ > 0. 

Theorem 2a now follows by section 2.1, (2). ■ 

Remarks. Our strategy for bounding conc(AX), carried out above, may be mo- 
tivated as follows. First, we obtain an integral formula for the probability mass 
function of AX, derived from its Fourier transform (Lemma 3). The integrand 
splits into n factors, which we then group into maximal subproducts such that the 
factors in each subproduct behave like independent random variables on the domain 
of integration. The worst case is now that these subproducts themselves are "com- 
pletely non-independent," that is, that they decay identically; this is the significance 
of Lemmas 4 and 5, and of the definitions of and erf . We bound the decay of the 
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integrand near the origin by a Gaussian (Lemma 8), explaining the appearance of 
the Cp~ m ^ 2 term in the conclusion of Theorem 2a. Away from the origin, we simply 
bound each subproduct by the constant C, giving the (C') p term. The parameter 
7 controls the boundary between the two approximation regimes. 

This two-regime bound (with arbitrary parameter 7) is sufficient for Corollary 2, as 
the (C') p term is asymptotically negligible asp-> 00. However, for non-asymptotic 
computations, the crudity of the approximation away from the origin is very notice- 
able. The (C') p term can be replaced by a more sensitive approximation, at the cost 
of simplicity. We do not pursue this goal here. 

5.2. Proofs of preceding lemmas. 

Proof of Lemma 3. In [TJ, Lemma 8.1, the following integral representation is 
proved: 



where the last two steps are straightforward simplification. □ 

Proof of Lemma 4. The Hardy-Littlewood inequality pE| states that for measurable 
functions $, \l/ : R m — > M> vanishing at infinity, one has 



provided that the integral on the right-hand side converges. Thus we obtain 




where b is an arbitrary Z> -vector. It follows that 



cone (AX) 







by induction on p. □ 



Proof of Lemma 5. 



Let A* be the mxm matrix whose rows are a^,_ 1 ^ m+1 , a£ ^ 
A : R m ->■ R m as the linear map t ■-)> A*t. Thus, 

A(t)i = (t, a (fc _i )m+i ) (1 < % < m). 



. . . , a^ m , and define 
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This map A* scales the volume of measurable sets uniformly by a factor of d :— 
| det(>l*)|, and takes the lattice A := (27rZ) m to the lattice 

A' := 27rZ[coli(A), col 2 (A), • • • , col m (A)]- 



Let K := (— 7r,7r] m and let K' := A*(K). Since K is a fundamental region of A, 
it follows that K' is a fundamental region of A'. Moreover, we assumed A to have 
integer entries, so A' is a sublattice of index d in A, and the induced map of tori 
: ]R m /A' — > R m /A is an even covering of order d. 

Identifying K with IR m /A and K' with M. m /A', we may regard as a map from 
K' to K, and <fi o A* as a self-map of K. If U C K is a measurable set, then 
(0 o A*)~ l {U) is the union of d disjoint preimages each of volume vo1 ^ . Thus, 
vo\(((f>oA*y 1 (U))=vo\(U). 

Observe that costj = cos(0(t)j) for all t. Therefore 

r> r (n fc ) = A- 1 (r> r (nr)) 



from which it follows that 



voi(r> r (nD) =voi(r> T (n fc )). 

This conclusion holds for all r > 0, so it follows from the definition of the symmet- 
rically decreasing rearrangement that (H| cct )* =]!£.□ 

Proof of Lemma 6. We prove (1) by contradiction. Suppose that $(t) > ^(t) for 
all t, but suppose $*(t ) < ^*(t ) for some t . Let r := ^*(t ). Then 

vol(r> ro ($)) < ||to|| m t; m < vol(r> T0 (*)), 

where v m is the volume of the unit ball in IR m . It follows that r> ro (^l/)\r> ro ( ( l ) ) has 
positive measure, contradicting our assumption that $(t) > ^(t) for all t. 

Statement (2) follows from (1) by the observation that max{ , I'*(t), r} is the sym- 
metrically decreasing rearrangement of max{^(t), r}. □ 



Proof of Lemma 7. 



max 



Recall that 
1 



In 



r 



l + o,y(l-cos4v) 



In particular, 



and 



7 2 



Ci = 



'a 



l + a v(l_cos-^=) 



a v 7r 2 



In 



1 + 2a) 



'a- 



aYir 2 



In 



1 + 2a) 



■c v \ 7 
if a i > — , 



r 



if a v < 



7T^ 
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Define t := min j-^y, 7r|, and define /(t) := 1 + a^(l — cost) 
k -' / to- 



o^i^-i t for 



Note that /(0) = 0. Also, we claim that /(to) — 0. This must be verified in two 

2 2 

cases, according to whether at > ^ or at < ^ • 



If at > then t 



= , so 



1 + at ( 1 — cos 
0. 



7 



'a 



)"-p(^-ln [l + oT (l-c s-^=)].^) 



If at < then t = 7r, and 

/(t ) = 1 + 2at - exp 



c^tt 2 



•In 



1 + 2a) 



a 



v) - 



0. 



This proves the claim that /(to) = 0. It follows that the average value of /'(t) on 
[0, t ] is zero. 

Finally, we observe that /'(0) = 0, and that fit) has nonpositive third derivative on 
[0, t ] (indeed, on [0, 7r]). The verification of these claims is routine and is omitted. 
We infer that either f'(t) = on [0, to], or /"(£) has exactly one sign change on 
[0,to], from positive to negative. In the latter case, f'(t) must also have exactly one 
sign change on [0,t ] (also from positive to negative), since its average value on the 
interval is zero. It follows in either case that /(t) > on [0, t ], and thus on [— 1 , t ] 
(since fit) is an even function). This proves the lemma. □ 

Lemma 7 is used to establish Lemma 8. 

Proof of Lemma 8. Let 

K := |t G M m : |t;| < minj-^Lp vrj for i = 1,2, ...,raj. 

If t G K, then by Lemma 7, 

nr(t) = n7x 



< 



1 a/1 + Q!(fc_i) m +i(l - COStj) 



Hut 



* y/l + e*Y(l - cost,) 

TO 



i=i 



Now suppose t £ K. Thus, there exists some i such that t; > minj- 
If U > TT, then we trivially have II£ cct (t) = < C 



TT 
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Otherwise, we have t; > — %=, and therefore 

rrcct ' ' 



nr (t) < 



< 



= e 

<c. 

Thus whether t e if or t ^ X, we have 



V^l + aV(l - costi) 
1 

1 + <(1 -cos (7/v^f)) 



IT fe ect (t) < max [ fje- c ^ Yt ?/ 2 , C" j, 
^ i=i J 



proving the lemma. □ 



5.3. Upper bounds on C, C". We now obtain 

Theorem 2b. Defining all constants as in the statement of Theorem 2a, 



C < 



7 



and 



2vWi + ¥) 

c< 1 



n 



l-q. 



1 + 



Remarks. Notice that as 7 — > 00, all other inputs being fixed, we have C = 

^((bT^) m ) an< ^ ^' = O(^). There is thus a trade-off between optimizing the 

Cp~ m / 2 term in Theorem 2a and optimizing the (C') p term; the optimal choice of 7 
depends upon the other inputs. 

Notice, also, that for fixed 7 and for values of bounded away from zero, the 
constant C is essentially a constant multiple of the bound on conc(AX) in Theorem 
1. In fact, for (say) 7 = 1, we have 



C < (.657) m J| 



i=i 



suggesting that the results of Theorem 2a are significantly better than those of 
Theorem 1 when p is large enough that the Cp~ m l 2 term dominates. 
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Proof of Theorem 2b. Recall that 

1 



In 



l + a^(l -cos^=) 



In 



'a 



1 + 2a) 



if a/ > 



7 



- — o ; 
7T 2 



if a v < 



7' 
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c ■.= w^arr 1 ' 2 , 



and 



C" 



max e 

l<j<rrt 



-7 2 Ci/2 



Regarding function of a( , we claim that this function is minimized at a( — \ . 

To demonstrate this claim, it suffices to check that: 

(1) The function f(x) := Xri ( l + 2x ^ is decreasing for < x < ^. 

2 

(2) The function g(x) := x(l — cos ^) is increasing for ^ < x < oo. 

Proof of (1): Differentiating, we obtain f'(x) = ^[rjf^; — + 2x)]. In general, 
ln(l + tt) > for u > 0, so we have /'(x) < for all x > 0. In particular, /(#) is 

2 

decreasing for < x < 



1 — cos -n= — 7T7= sin -?=. It will be 



Proof of (2): Differentiating, we obtain g'(x) 
convenient to define y := y(x) = This change of variable bijectively transforms 

2 

the interval ^ < x < oo into the interval < y < ir. We may hence write 
g'{x) = h(y), where 

h(y) := 1 — cos y — ^ sin y. 
Differentiating twice with respect to y, we obtain 

dh 1 y _ d 2 h y 

_ = -any --cosy and ^^smy. 

In particular, note that /i(0) = 0, ft/(0) = 0, and h"(y) > for < y < ir. It follows 
that h(y) > for < y < ir. Equivalently, g'(x) > (and g(x) is increasing) for 
< x < oo. 



We have thus proved that q is minimized when a( = 
d = ^ In (1 + 2£) . That is to say, 



in which case 



T 



7T 
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for all values of a( . It follows that 



and 




proving Theorem 2b. ■ 

6. Proof of Theorem 3 

We obtain Theorem 3 as a corollary of Proposition 3a, a more general result to 
follow. In order to state and prove Proposition 3a, we borrow the following notions 
from the theory of partially ordered sets (posets). 

Definitions 4. Let S be a poset and x,y G S. We say that x covers y if x > y and 
x>z>y^>zE {x, y}. 

A rank function on a finite poset S is a function rk : S — > Z> , such that for all 
x,y G S , if x covers y, then rk(x) = rk(y) + 1. We say that rk(x) is the rank of 
element x. A layer of a ranked poset is a level set of the rank function. 

The chain of cardinality N is denoted by [N], and is automatically assigned herein 
the unique rank function which assigns its least element rank 0. The product of two 
ranked posets S, S' is automatically assigned rank function equal to the sum of the 
rank functions of S, S' . 

An antichain in a poset is a collection of pairwise incomparable elements. The 
width of a poset S, denoted by w(S), is the cardinality of its largest antichain (s) . 
The Whitney number Wi of a ranked poset is the cardinality of its layer of rank i. 
If the width of a ranked poset is equal to its largest Whitney number, then we say 
that the poset has the Sperner property. 

For example, the "Boolean cube" ([2] x [2] x [2]) has Whitney numbers 1, 3, 3, 1 and 
width 3. Note that the width of any poset is greater than or equal to its largest 
Whitney number, because all layers are necessarily antichains. 

Now we are ready to state 
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Proposition 3a. Let Xi, X 2 , . . . , X p be independent, integer-valued random vari- 
ables such that . 

conc(X,) < — for 1 < j < p, 
where N ± , N 2 , . . . ,N p are positive integers. Then 

Moreover, given any fixed N such that 2 < N ± , N 2 , . . . , N p < N, we have 
w([N 1 ]x...x[N p \) fn^., 



as p — > oo . 



N 1 N 2 ---N P 

3=1 



This proposition will be easiest to prove under the assumption that each Xj is 
uniformly supported on Nj points (with mass at each). To justify passing to this 
case, we will use the following definition, and the two lemmas after it: 

Definition 5. A discrete random variable Y is a mixture of random variables 
Y 1 ,Y 2 , . . . if its probability mass function lies in the convex hull of the probability 
mass functions ofY 1 ,Y 2 ,.... 

Lemma 9. Let Y be a random variable, supported on Z> 0; such that conc(F) < 
Then Y can be written as a mixture of random variables Y±, Y 2 , . . ., such that each 
Yk is uniformly supported on N points, i.e., has an N -point support with probability 
mass jj at each point in its support. 

Proof of Lemma 9. Let M. be the space of probability measures on Z> . Let 

M(N) :=^fieM: max //({&;}) < 
and 

Ai u (N) := {/i G M. : fi is uniformly supported on iV points}. 
By the Krein-Milman theorem, .M(iV) is the convex hull of its extreme points. We 
claim that the extreme points are precisely the points of A4 U (N). It is immediately 
evident that each point of A^ u (^) is an extreme point of M.(N). Conversely, suppose 
fx G .M(iV)\.M u (A0. Thus there is some k G Z> such that < //({&}) < jj, but 
in fact, there must be at least two distinct such k, since the total mass of \i is 1 (an 
integer multiple of -^). Therefore, fi is not an extreme point of Ai(N). 

This proves our claim. Hence the probability measure associated to Y can be written 
as a countable convex combination of points of Ai u (N), each of which defines the 
distribution of a random variable Yk (proving the lemma). □ 

Lemma 10 (Properties of superpositions). If Y is a mixture of random variables 
Yi, Y 2 , . . ., then: 
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(1) There is some k > 1 for which conc(F) < conc(Yfc). 

(2) If Z is a random variable and f a function such that Z = f(Y), then Z is a 
mixture of random variables Zy, Z 2 , ■ ■ ., where Z}~ = /(Yfc). 

Proof of Lemma 10. By the definition of mixture, there exist nonnegative a±, a 2 , ■ ■ ■ 
such that ol\ + a 2 + ■ ■ ■ = 1 and such that 



fc=i 

Thus by the pigeonhole principle, for arbitrary y, there exists k = k(y) such 
that 

Pr[Y = y]<Pr[Y k = y]. 

Choosing y such that conc(F) = Pr[Y — y], we conclude that conc(F) < conc(Yfc) 
for this k. This proves claim (1) in the lemma. Claim (2) is self-evident. □ 

The heart of the proof of Proposition 3a is the following version of the local limit 
theorem: 

Definition 6. A sequence (. . . , bo, b\, b 2 , ■ ■ ■) of nonnegative real numbers is 
properly log-concave if it is log-concave (i.e., &t_iA+i < b\ for all t) and has no 
internal zeroes (i.e., if b t > and b t+ k > 0, then b t+1 , b t+2 , . . . , b t+ k-i > 0). 

Lemma 11 (Bender). Suppose that (C p : p E N) is a sequence of integer-valued 
random variables, (F p ) are the corresponding distribution functions, and (cr p ) and 
(/x p ) are sequences of real numbers such that lim F p (a p x + fi p ) = -^== f*^ e~ t2 ^ 2 dt 

for every Also suppose that o p — > oo as p — > oo. Further, suppose that, for 

every p, the sequence b p (t) := Pr(£ p = t) is properly log-concave with respect to t. 



uniformly for all x 6R. 

This result originally appeared in [2], but the above statement is based on its treat- 
ment in [8] ; see either source for a proof. 

Proof of Proposition 3a. For j = l,2,...,p, we are given to assume that 
conc(Xj) < jrr-. By Lemma 9, each Xj is a superposition of some random vari- 
ables which are each uniformly supported on some Nj points. Thus the ran- 
dom vector X = {X\, . . . ,X p ) is a mixture of random vectors each of the form 
X^ := (x[ k \ . . . , X p h ^), where the coordinates are independent and each X^ is 
uniformly supported on Nj points. The sum Xy + • — \- X p is a function of X, so by 
using both parts of Lemma 10, we see that 



Pr[Y = y} = J2^kPr[Y k = y]. 



Then 




conc(Xi + • ■ ■ + X p ) < cone 
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for some k. Since we are seeking an upper bound on conc(Xi + • • - + X P ), we assume 
with no loss of generality that X = X^ k \ or, more to the point, that each coordinate 
Xj is uniformly supported on Nj points (with mass on each). 

Denote the support of Xj by {aji, a j2 , . . . , a^}, where aji < a j2 < ■ ■ • < djNj- 
Then 

Olii + a 2i 2 + - " " + dpip = dli^ + 0-2i' 2 + " " " + 0,pi' p 

implies that the p-tuples (ii, i%, . . . , i p ) and i' 2 , ■ ■ ■ , i' p ) are identical or incompa- 
rable in [JVi] x • • • x [N p ] . It follows that 

conc^ + ■ ■ ■ + X p ) < ^...^ ■ 
This proves the first claim of Proposition 3a. 

For the remainder of the proof, assume that 2 < N±, N2, ■ ■ ■ , N p < N for some 
integer N. We are going to apply Lemma 13. Let ( p denote the rank of a uniformly 

distributed random element of [Ni] x [JV 2 ] x ■ ■ • x [N p ]. Set fi p := Afl+ ' 2 +jVp and 
tv 2 — 1 

dp = X]j=i 12 • ^ * s easn y verified that \i p and a 2 are respectively the mean and 
the variance of ( p . By Lyapunov's central limit theorem [3], the condition 



1 f x 2 
lim F p (a p x + //„) = / e~* ^ 2 dt 

P^°° ' V27T J-00 



in Lemma 11 is satisfied. The hypothesis a p — > 00 is plainly also satisfied. 

To see that the sequence b p (t) := Pr(( p = t) is properly log-concave, we note 
that this sequence is proportional to the Whitney numbers of the chain product 
[Ni] x [N 2 ] x • • • x [N p ], which is the convolution of the sequences of Whitney num- 
bers for the factor chains. Each factor chain has Whitney numbers 1, 1, . . . , 1, 0, 0, . . . 
(a properly log-concave sequence). Furthermore, the convolution of properly log- 
concave sequences is again properly log-concave, see e.g. [H] . Thus, (b p (t)) is prop- 
erly log-concave. 

All conditions of Lemma 11 have been verified, so the conclusion holds: 

1 2/ 

lim cr p Pr(Cp = [cr p x + /x p J) = —j=e~ x 12 
uniformly for all igK. Setting x = 0, we obtain 

Pr(C P =M) ' 



llXOp 

v ■ -1/2 
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Finally, we observe that chain products have the Sperner property [8]. In particular, 
the width in the above formula is equal to the Whitney number j , so that 

«.([*,] x-xM) 

This completes the proof of the proposition. ■ 

As an instance of Proposition 3a, we derive Theorem 3: 

Proof of Theorem 3. As noted in the proof of Theorem 1, we have conc(XjSLj) = 
E p/.) +1 < lEp/ j+ij f° r 1 — J ' — n - Since ai, a 2 , . . . , a m are linearly independent, we 
have 

m 

conc(AX) = Y\ conc(A i a i + X m+i a.i + X 2m+i &i h X (p _ 1)m+i aj) 

i=i 

m 

= Y\ conc(Xj + X m+i + X 2m +i + ■ ■ • + A( p _i) m+ j) 
i=i 

m 

^n(^(LE(^)+ij 2 -i)~ 

i=l 

where the last claim follows by Proposition 3a. Finally, by section 2.1, (2), we infer 
Theorem 3. ■ 
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